The first map shows the age-adjusted rate of emergency department visits for asthma per 10,000 people, averaged over 2015 to 2017, in the Bay Area. Asthma prevalence fluctuates at a rate of 50 to 150 in most of the Bay Area, with highest prevalence seen in Central and Western Bay Area, such as Vallejo at more than 200, followed by San Leandro.
The second map shows the annual mean concentration of PM2.5, averaged over 2015 to 2017, in the Bay Area. The PM2.5 concentration is lowest in the northern and southern parts of the Bay Area at about 6 to 7 micrograms/cubic meter. The highest levels of PM2.5 in the Bay Area are found in clusters, mostly in the middle of the Bay Area, such as Oakland and Napa at 10 micrograms/cubic meter.
The best-fit line does not look very representative as there are many clusters of points below and above the line. In particular, high asthma rates of 150 to 250 in areas with around 8 to 9 micrograms of PM2.5 lie far away from the line.
##
## Call:
## lm(formula = Asthma ~ PM2.5, data = ces4_clean2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54.47 -25.89 -9.61 12.94 182.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -116.278 13.040 -8.917 <2e-16 ***
## PM2.5 19.862 1.534 12.950 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37.49 on 1578 degrees of freedom
## Multiple R-squared: 0.09606, Adjusted R-squared: 0.09549
## F-statistic: 167.7 on 1 and 1578 DF, p-value: < 2.2e-16
An increase of 1 micrograms/cubic meter in PM2.5 is associated with an increase of nearly 20 visits to the asthma emergency department per 10,000 people. Variation in PM2.5 explains 9.55% of the variation in asthma.
The mean of the residual is close to zero, but there appears to be a skew to the left of the density curve of the residual, suggesting that the residuals are not normally distributed. This means that the errors made by the model are not consistent across variables and observations, i.e. the errors are not random.
Although there are still large clusters above and below the line, this model is better as the range of points above and below the best-fit line are quite similar.
##
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = ces4_clean2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.00402 -0.46479 0.03313 0.42298 1.75525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.69234 0.22840 3.031 0.00248 **
## PM2.5 0.35633 0.02686 13.264 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6566 on 1578 degrees of freedom
## Multiple R-squared: 0.1003, Adjusted R-squared: 0.09974
## F-statistic: 175.9 on 1 and 1578 DF, p-value: < 2.2e-16
An increase of 1 microgram/cubic meter in PM2.5 is associated with an increase of e^0.35633 = 1.43 visits to the asthma emergency department per 10,000 people. Variation in PM2.5 explains 9.97% of the variation in log(Asthma).
The distribution is more normal now - there is less skew, with about an even number of residuals on both sides of the density curve.
The census tract with the most negative residuals is 6085513000 at Stanford University, with a negative residual of -2.00402. A negative residual means that the regression line overestimated the number of asthma cases in Stanford for its level of PM2.5. It may be a result of distortion due to the age adjusted rate used to calculate asthma data as Stanford comprises a large population of students so other age groups who may be more vulnerable to serious asthmatic events are less represented. Stanford may also have good healthcare resources to assist asthmatic individuals before asthma events escalate into a more serious emergency.